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FOREWORD 

This  report  summarizes  the  work  accomplished  under  Phase  I  of  the  Dynamic  Image 
Disparity  Analyzer  (DIDA)  program.  This  in-house  effort  was  conducted  by  the  Advanced 
Systems  Research  Group  (AAAT-3) ,  Information  Processing  Technology  Branch  (AAAT) , 
Avionics  Laboratory  (AFWAL/AA),  Wright-Patterson  AFB,  Ohio,  under  Project  2003,  Task 
06,  Work  Unit  51  . 

The  research  was  performed  during  the  period  March  1981  to  November  1983. 
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SECTION  I 
INTRODUCTION 

This  report  summarizes  the  work  accomplished  under  Phase  I  of  the  Dynamic  Image 
Disparity  Analyzer  (DIDA)  program  from  March  1981  to  November  1983.  Image  disparity 
analysis  is  the  determination  of  geometrical  differences  between  two  or  more  images 
caused  by  binocular  parallax,  camera  motion,  object  motion,  or  some  combination  of 
these.  Disparities  between  images  can  be  represented  as  a  vector  field  mapping  one 
image  into  the  corresponding  points  of  the  other  image.  The  disparity  field  can 
provide  important  3-D  information  about  structure  and  motion  which  is  impossible  or 
very  difficult  to  derive  from  a  single  image. 

The  goal  of  the  DIDA  program  was  to  develop  advanced  real-time  systems,  as 
depicted  in  Figure  1.  Dynamic  image  analysis  embraces  two  distinct  areas:  (1)  the 
correspondence  problem  and  (2)  interpretation.  This  report  (and  most  research  in  the 
literature)  deals  with  the  first  area,  the  correspondence  problem  of  associating 
corresponding  structures  in  different  images;  i.e.,  the  determination  of  the 
disparity  field.  The  interpretation  problem  is  concerned  with  transforming  measured 
disparity  fields  derived  from  2-D  images  into  physical  parameters  related  to  3-D 
structure  and  motion  of  objects.  Interpretation  procedures  are  poorly  understood. 

Current  papers  on  the  interpretation  problem  assume  solutions  to  the 
correspondence  problem  which  unfortunately  are  not  always  possible.  Theory  and 
formalism  are  presently  in  a  state  of  flux.  Journal  articles  are  usually  less  than 
ten  years  old  in  this  area,  and  there  are  no  comprehensive  review  articles  or  text 
books.  One  encounters  major  difficulties  approaching  an  interpretation  problem  with 
no  a  priori  knowledge.  The  basic  reason  for  the  difficulties  is  that  the  general 
problem  deals  with  non-linear  equations  of  enormous  complexity.  These  equations  have 
multiple  answers  and  require  numerical  analysis  to  achieve  approximate  solutions. 
These  numerical  methods  may  converge  to  a  wrong  answer  and  are  sensitive  to 
initializing  parameters.  Additional  constraints  and  other  a  priori  knowledge  are 
often  utilized  to  simplify  both  disparity  analysis  and  the  difficult  non-linear 
interpretation  problems.  It  is  too  early  now  to  anticipate  the  final  formulations  of 
and  practical  solutions  to  interpretation  problems. 
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Figure  1.  Dynamic  Image  Analyzer 
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The  type  of  information  derivable  from  a  disparity  field  includes  depth, 
surf-ace  orientation,  object  boundaries,  camera  motion,  and  object  motion. 

Derivation  of  this  physical  data  from  sensor  inputs  is  fundamental  to  many  military 
applications,  including  automatic  target  detection,  recognition,  and  tracking,  and 
passive  navigation  with  stereo-motion  imagery.  Currently,  there  is  much  research  on 
the  optical  flow  (continuous  field  of  disparities)  problem  which  seeks  to  derive  an 
approximate  model  of  the  earth's  surface  from  a  sequence  of  images  taken  from  a 
flying  sensor.  Even  a  crude  surface  model  provides  passive  ranging  which  is 
extremely  useful  in  determining  whether  a  blob  in  the  image  is  the  proper  size  for 
selected  targets.  At  the  1983  DARPA  Image  Understanding  Workshop  nearly  half  the 
papers  dealt  with  the  optical  flow  problem,  which  is  one  of  the  relatively  easy 
interpretation  problems. 

The  various  successful  approaches  to  the  correspondence  problems  are  reviewed  and 
discussed  in  Section  II.  One  conclusion  of  this  review  is  that  these  approaches  do 
not  form  an  adequate  basis  for  developing  a  general  purpose  real-time  high  density 
disparity  analyzer.  In  the  intial  phase  of  the  DIDA  program,  research  efforts  were 
mainly  concerned  with  developing  improved  algorithms  for  disparity  analysis.  The 
overall  DIDA  program  plan  and  other  research  done  in  support  of  it  are  discussed  in 
Section  III.  Theoretical  results  derived  by  the  author  are  presented  in  Sections  IV 
and  V. 
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SECTION  II 
TUTORIAL 

The  ideal  solution  to  the  correspondence  problem  is  found  in  biological  vision 
systems  which  are  still  a  mystery.  Nature  uses  a  highly  parallel  network  of  simple 
processors  which  communicate  with  their  nearest  neighbors.  The  system  extracts  low 
level  features  from  receptor  outputs  in  the  retina  and  weaves  these  local  features 
into  a  global  perceptive  tapestry.  The  determination  of  a  universal  disparity 
algorithm  is  a  very  difficult  and  unsolved  problem.  These  approaches  have  been 
developed  which  can  extract  disparity  information  within  their  respective  domains. 
These  approaches  are  referred  to  as  (1)  differencing  techniques,  (2)  spatial  -  temporal 
gradient  analysis,  and  (3)  feature  matching. 

1.  DIFFERENCING  TECHNIQUES 

Differencing  techniques  provide  for  a  point  by  point  determination  of 
significant  changes  in  a  sequence  of  images  (Reference  1,  2,  3,  and  4). 

Differencing  requires  registered  images  and  is  therefore  not  suitable  for  moving 
camera  applications.  Early  techniques  were  based  on  thresholding  simple  temporal 
differences: 


I (X,Y,T)  -  I (X, Y,T  -  AT)  >  e 


Current  techniques  are  more  elaborate  neighborhood  functions  such  as 


t<s„  ♦  s.r  +  (m„  -  n,n 


2i2 


>  t 


Sc  *  SR 


(1) 


(2) 


where  M  and  S  a  mean  and  variance  for  sample  areas  from  current  (C)  and  reference 
(R)  image  frames.  Thresholding  results  in  binary  difference  images  which  essentially 
act  as  motion  detectors.  Disparity  fields  are  not  measured  directly.  In  addition 
to  image  registration,  occlusion  problems  and  boundary  definition  require 
sophisticated  algorithms  to  track,  integrate,  and  interpret  a  sequence  of  binary 
Images.  Application  of  this  approach  Is  suited  to  Initial  phases  of  dynamic  image 
analysis  for  initial  estimates  of  velocity  field  clusters  and  segmentation  estimates. 
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2.  SPATIAL/TEMPORAL  GRADIENT  ANALYSIS 

The  second  approach  combines  the  intensity  gradient  and  temporal  gradient  at 
each  pixel  in  the  image;  i.e.,  vl  and  <$I/it  (References  5,  6,  and  7).  The  basic 
assumption  made  i-n  this  approach  is  that  the  brightness  of  a  displaced  patch  of  a 
moving  object  remains  constant: 

I(X,Y,t)  =  I (X+AX ,  Y+iY,  t+it)  (3) 

where  the  patch  undergoes  a  displacement  of  X,  Y  in  t.  With  this  assumption,  one 
can  derive  a  velocity  constraint  which  relates  the  velocity  to  changes  in  intensity 
over  both  time  and  space: 


<5 1  iX  +  5  Y  +  =0  ( 4 ) 

iX  it  iY  it  it 

where  higher  order  derivatives  in  the  Taylor  expansion  are  suppressed  in  the  limit 
At-O.  The  gradient  and  velocity  vectors  are  given  by 

VI  =  Ci I/c  X,  5  1/5  Y]  (5) 


and 


V  =  [6X/5t,  3Y/6t]  (6) 

where 

AX/At  ♦  iX/it,  AY/At  -  5Y/5t,  and  A  I/At  ♦  6l/it. 

An  analysis  of  higher  order  terms  in  the  Taylor  expansion  was  made  in  a  study  of 
error  reduction  techniques  by  Thompson  and  Kearney  (Reference  7). 

The  basic  constraint  equation  can  be  rewritten  as  a  vector  equation: 

71  •  V  +  5 I/it  *  0.  (7) 
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Note  that  the  constraint  equation  gives  only  the  component  of  the  velocity  vector 
(or  optical  flow)  which  is  parallel  to  the  gradient  vector.  The  optical  flow  field, 
therefore,  cannot  be  computed  at  a  point  independent  of  neighboring  points. 

Additional  constraints  are  necessary  to  find  the  perpendicular  component  of  the 
velocity  vector.  Two  reported  ways  of  augmenting  the  basic  constraint  equation  are 
the  clustering  approach  and  the  using  of  a  smoothness  constraint. 

The  clustering  approach  is  similar  to  a  Hough  transform  and  suitable  for 
segmenting  a  sequence  of  images  with  multiple  rigid  body  translations  and  in  which 
there  is  no  camera  motion.  The  unknown  velocity  component  vj_  lies  on  a  constant 
line  perpendicular  to  the  vii  component.  Several  constraint  lines  are  shown  in 
Figure  2.  In  general,  the  constraint  lines  of  neighboring  points  will  not  be 
col  inear  and  therefore  will  intersect.  Over  the  cluster  of  intersecting  points  some 
average  is  made  to  estimate  v_L.  A  pre-smoothing  operation  to  remove  noise  and  high 
frequency  texture  variations  generally  improves  velocity  estimates.  Thompson  and 
Barnard  (Reference  8)  described  a  "pseudo  inverse"  method  which  is  equivalent  to 
least  square  error  analysis  and  similar  to  clustering  analysis.  These  methods  have 
been  successfully  tried  on  real  images.  They  involve  relatively  complex  computations 
and  cannot  deal  with  rotational  motion  which  generates  a  continuum  of  velocity 
directions. 

A  smoothness  constraint  is  obtained  from  a  measure  of  the  departure  from 
smoothness  in  the  velocity  flow,  such  as: 

Ec  =  [  (£U/<5X  )2  +  (<5U/<5Y)2]  +  [(6V/6X)2  +  (6V/6Y)2]  ’  (8) 

where  u,v  are  the  x,y  velocity  components  (Reference  5  and  6).  Let 

Eb=  tl  u  +  J7  (basic  velocity  constraint)  (9) 

Constraint  equation  are  obtained  by  minimizing 

E  =  /  /  (a  E2  +  E2)  dXdY  (10) 
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using  the  calculus  of  variation.  This  particular  measure  results  in  a  set  of 
iterative  equations: 

OD 
(12) 
03) 


Un+1  =  if1  -  (6I/5X)  •  Efa/(a2  +  G2)  , 
Vn+1  =  V*1  -  ( S I / <5 Y )  •  Eb/(a2  +  G2)  * 
G2  =  (5I/6X)2  +  (6 1/5 Y  )2 


where  the  neighborhood  averages  indicated  by  bars  result  in  a  "fill-in"  feature  for 
uniform  regions.  That  is  to  say,  regions  with  zero  gradients  have  no  local 
information  to  constrain  velocity.  In  this  case,  velocities  are  taken  to  be 
neighborhood  averages.  Eventually,  velocities  in  non-uniform  regions  or  border 
regions  will  propagate  inwards  and  fill  in  ambiguous  regions.  This  corresponds  to 
obtaining  a  solution  to  the  Laplace  equation  with  given  boundary  conditions.  So  far, 
this  approach  has  been  applied  mainly  to  synthetic  images.  It  wcrks  best  when  the 
gradient  is  not  too  small  and  varies  in  direction  from  point  to  point.  The 
smoothness  assumption  breaks  down  at  discontinuities  which  may  prove  to  be  a  major 
flaw  in  this  approach.  Heuristics  have  been  proposed  for  establishing  motion 
boundaries  using  maximum  velocity  and  orientation  similarity. 

The  foregoing  approach  was  reviewed  for  two  reasons:  first,  to  present  a 
technique  which  seeks  to  find  a  disparity  vector  for  each  pixel  and  secondly,  to 
dispel  any  inference  that  the  computation  of  the  optical  field  is  a  closed 
problem. 

3.  MATCHING  TECHNIQUES 

There  are  many  different  approaches  to  image  matching,  depending  on  what  is  to 
be  matched  and  how  it  Is  to  be  matched.  The  raw  data  of  image  analysis  is  a  matrix 
of  image  intensities.  Hence,  Intensity  matching  using  cross-correlation  has  been 
extensively  studied  and  utilized.  In  Image  analysis,  raw  intensity  data  is 
transformed  Into  various  representations  characteristic  of  one  or  more  features. 
Features  are  derived  from  the  spatial  variations  and  relationships  of  the 
distribution  of  features  or  attributes  such  as  position,  size,  orientation,  color, 
adjacencies,  etc.  Symbolic  matching  Involves  less  matching  ambiguity  because  complex 
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features  are  unique.  Another  advantage  is  that  they  are  more  invariant  to 
substantial  and  arbitrary  global  changes  and  orientation..  A  typical  strategy  of  many 
symbolic  matching  algorithms  is  the  coarse-fine  strategy  in  which  one  selects  and 
pairs  the  most  obvious  matching  features  initially.  Next,  these  initial  matches  are 
used  to  find  approximate  coordinate  transformations  and  scaling  factors.  Using  this 
transformation,  the  remaining  segments  are  modified  and  matched,  the  transformation 
then  refined,  etc.  Thus,  one  can  see  that  symbolic  matching  can  work  with  large 
rotations  which  foil  other  approaches. 

Simple  or  primitive  features  such  as  points,  micro-edges,  blobs,  and  texture 
measures  are  local  functions  of  the  intensity  values  over  a  small  neighborhood  of 
pixels  (picture  elements).  Computation  of  local  features  is  amenable  to  pipeline 
and  parallel  processing  techniques.  An  image  matching  algorithm  is  outlined  in 
Section  V  which  features  an  adaptive  blob-matching  approach  for  deriving  dense 
disparity  fields.  Simple  features,  such  as  points,  have  multiple  possible  matches 
and  require  additional  measures  to  reduce  ambiguity.  With  n  elements  selected  in 
each  image,  there  are  n!  possible  one-to-one  mappings.  These  measures  are 
illustrated  by  the  label  relaxation  technique  for  matching  point  sets,  which  is 
described  below.  Point  matching  will  be  discussed  in  some  detail  because  of  its 
relationship  to  work  done  on  "interest  measures"  under  the  DIDA  in-house  research 
effort. 

The  following  point  matching  algorithm,  which  assumes  limits  for  the  amount  of 
displacement  and  rotation  between  images,  has  been  described  by  Thompson  and  Barnard 
(Reference  8  and  9): 

a.  Use  "Interest  Operator"  to  locate  unique  set  of  points  in  each  image. 

b.  Assign  labels  and  probabilities  to  candidate  matches. 

c.  Use  label  relaxation  technique  to  increase  local  consistency  and  reduce 
matching  ambiguity. 
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4.  INTEREST  OPERATORS 

There  is  no  generally  preferred  local  method  for  identifying  the  same  set  of 
points  in  multiple  images.  The  matchability  of  any  point  depends  on  the  intensity 
values  of  neighboring  points.  A  uniqueness  measure  for  each  pixel  is  a  function  of 
intensity  values  over  a  small  window  about  each  pixel.  Points  with  the  highest 
measure  in  their  local  region  are  selected  as  “interesting"  points  which  have  the 
highest  likelihood  of  being  matched.  One  early  measure  is  simply  the  statistical 
variance  of  image  intensities  within  the  window: 

2 

VAR  =  MEAN  ( I (i  ,j)  -  I(i,j)  }  (14) 


where 

I  =  MEAN  (1(1  ,j)  ). 

In  order  to  improve  localization  of  selected  points,  other  operators  were 
developed  which  are  more  sensitive  to  the  crossings  of  multiple  linear  features.  Two 
popular  window  measures  are  based  on  the  means  of  the  square  of  intensity 
differences  between  adjacent  pixels  in  four  different  directions: 

0(1)  =  MEAN  (I(i,j)  -  I(i+l,j)  )2 

0(2)  =  MEAN  (I(i,j)  -  I(i,j+1)  )2 

(15) 

0(3)  *  MEAN  (I(i,j)  -  I(i+l,j+l)2 
0(4)  =  MEAN  (1(1+1, j)  -  1(1, J+l)  )2 

The  Moravec  interest  measure  (Reference  10  and  11)  is  the  directed  variance: 

0IRVAR  -  MIN  CD(1),  D(2),  0(3),  (16) 

The  second  measure  Is  the  Hannah  edge  variance  (References  12  and  13): 

EVAR  «  VAR  *  MIN  [  {j^j,  ]  (17) 
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Both  the  Moravec  and  Hannah  interest  measures  seek  out  the  direction  associated 
with  minimum  difference.  If  the  minimum  is  large,  the  differences  in  all  directions 
are  large;  hence,  the  features  selected  tend  to  be  associated  with  multi-directional 
attributes.  This  in  turn  insures  that  the  feature  is  localized  and  not  dominated  by 
a  difference  due. to  a  single  linear  feature. 

Point  selection  or  interest  operators  can  be  defined  as  local  maxima  of  an 
interest  measure.  A  generalized  treatment  of  point  selection  operators  will  be 
presented  below  based  on  gradient  statistics.  It  will  be  shown  that  the  computation 
of  the  Moravec  and  Hannah  operators  can  be  improved  and  simplified.  A  new  set  of 
point  selection  operators  are  presented  for  future  experimentation  and  evaluation. 

5.  RELAXATION  FORMULATION 

The  relaxation  labeling  process  can  be  broken  down  into  two  phases: 
initialization  and  iteration.  The  discussion  follows  that  of  W.  Thompson  (Reference 

8).  The  initialization  phase  begins  after  a  set  of  unique  points  have  been  selected 

from  each  image.  The  problem  is  to  match  the  points  in  one  image  called  the 

reference  image  with  the  corresponding  points  in  the  second  image.  For  each  point 

"i"  in  the  reference  image,  one  constructs  a  label  list  L1  *C  1^,  1^].,  where  1^  * 
x .  -  Xj '  is  the  vector  displacement  between  point  "i"  in  the  reference  image  and 
point  "j"  in  the  second  image.  Each  l,.  is  a  candidate  match  or  disparity  vector 
between  two  points.  1.  is  a  special  no-match  label  for  point  "i". 

The  next  step  is  to  assign  initial  probabilities  p(l.j),  pO^*)  to  each  label 
based  on  local  singularity  measures  (abbreviated  p.^.,  p^).  The  initial  probabilities 
p°(l.j)  are  based  on  correlations  between  a  small  reference  window  from  one  image 
centered  on  x^  and  a  window  from  the  second  image  centered  at  (xi  +1^).  The  sum  of 
the  squares  of  the  differences  between  windows  has  been  used  as  the  measure  for  the 
amount  of  correlation.  Many  potential  matches  can  be  eliminated  if  an  upper  limit  on 
the  magnitude  for  all  the  disparity  vectors  is  assumed. 


11 


AFWAL-TR-85-1011 


A  consistency  measure,  ,  is  the  sum  of  probabilities  P^v  for  all  labels  1^ 
closely  aligned  with  liu  and  in  the  same  neighborhood: 

qiy  “  11  pKv  ekv  ( 

where  eKv  is  equal  to  one  if  both  of  the  following  conditions  are  satisfied  and 
equal  to  zero  otherwise. 

(lj  Condition  for  point  "i“  to  be  near  to  point  “k“: 

max  [|x.  -  xK| ,  \y.  -  yj  ]  ±  R  (constant).  (' 

(2)  Condition  for  to  be  closely  aligned  to  1^: 

Mjy  -  lKv!  <  e  (constant).  (2 

The  consistency  measures  are  used  in  turn  to  update  the  label  probabilities. 


.  Pfj  *  (A  *  B  *  ,‘j) 


"t+1*  t* 

P.  =  P. 

1  1 


represent  the  new  unnormalized  probabilities  where  A  and  B  are  adjustable  gain 
parameters  (typically  A  *  0.3  and  B  *  3).  To  normalize,  let 

P1*1  -  P1*'  /  (  Z  Pltl ). 


Note  that  the  probability  of  the  no  match  label  is  affected  only  by  this 
normalization  step. 
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6.  IMPLEMENTATION 

Current  implementation  of  matching  systems  focuses  mainly  on  stereo  systems, 
either  binocular  or  motion  (monocular)  stereo.  All  these  approaches  are 
characterized  by  point  or  edge  matching  between  images  taken  at  approximately  the 
same  time  and  perspective.  In  stereo  matching,  the  possible  disparities  are  tightly 
constrained.  Here  the  matching  problem  requires  a  dense  grid  of  points  making  the 
matching  problem  considerably  more  difficult  than  determination  of  distance.  Some 
systems  concerned  with  automatic  or  robotic  navigation  will  trade  off  density  for 
accuracy.  Sub-pixel  accuracy  is  sought  using  point  matching  and  cross  correlation. 
At  the  time  of  this  report,  there  are  no  complete  real-time  systems.  However,  a 
near  real-time  prototype  matching  system  is  being  developed  at  MIT,  which  produces  a 
16  by  16  array  of  depth  measurements  every  15  seconds  from  a  vidicon  camera.  For 
more  information,  see  the  excellent  survey  article  by  Barnard  and  Fischer, 
“Computational  Stereo  From  an  IU  Perspective"  in  the  Proceedings:  DARPA  Image 
Understanding  Workshop  (April  1981). 

Optical  techniques  are  not  included  in  this  survey. 
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SECTION  III 
OIDA  RESEARCH  PROGRAM 

Dynamic  and  3-0  image  analysis  is  a  rapidly  emerging  research  area.  Real-time 
systems,  however,  are  slowly  emerging  because  current  algorithms  have  limitations. 
This  area  of  research  is  difficult  because  it  includes  automatic  image  understanding, 
non-linear  adaptive  scene  analysis,  enormous  data  manipulation  problems,  and 
knowledge  base  design  problems  beyond  current  methodology.  This  research,  which  is 
basic  to  developing  robust  computer  vision,  will  certainly  draw  upon  and  enhance  our 
understanding  of  biological  vision  systems. 

The  in-house  OIDA  program  was  initiated  at  AFWAL/AAAT  in  the  early  part  of 
1981.  OIDA  is  an  acronym  for  Dynamic  Image  Disparity  Analyzer.  The  long  term 
objectives  of  the  "^DA  program  were  to  focus  research  in  the  area  of  real-time  vision 
systems  and  to  develop  requisite  in-house  expertise  and  processing  facilities.  The 
initial  objective  was  to  conduct  research  applicable  to  the  development  of  a  real¬ 
time  disparity  analyze'.  Figure  3  depicts  the  DIDA  Program  Plan.  This  technical 
report  summarizes  the  work  accomplished  in  Phase  I  of  the  program.  The  DIDA  program, 
if  support  ,  would  stimulate  important  main  stream  research  and  contribute  to 
advanced  image  understanding  technology  for:  robotics,  passive  navigation,  photo 
analysis,  target  detection,  recognition,  and  tracking,  and  multi-sensor  management 
techniques. 

The  work  in  Phase  I  was  accomplished  by  AFWAL/AAAT  personnel,  two  AFIT 
students,  and  an  LDF  contracted  research  effort  at  the  University  of  Minnesota.  The 
DIDA  team  roster  is  shown  in  Figure  4.  Mr.  C.  Wagner  assisted  in  planning  and 
enhancing  the  AAAT  DIDA  laboratory  facilities  which  was  to  be  merged  eventually  in 
1983  with  the  current  laser  vision  laboratory  at  AAAT.  A  follow-on  research  effort 
at  the  U.  of  Minnesota  was  funded  by  the  Air  Force  Office  of  Scientific  Research  in 
1983.  The  remainder  of  this  section  will  briefly  summarize  the  documentation 
generated  in  Phase  I.  Sections  IV  and  V  present  research  results  by  the  author  which 
are  not  Included  in  the  other  OIDA  reports  mentioned  below. 
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Figure  3.  OIOA  Program  Plan 
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AFWAL/AAAT  Dr.  Louis  A.  Tamburino 

Mr.  Charles  E.  Wagner 


(Project  Engineer) 
(Part  Time  1981) 


AFIT 


Capt.  Franklin  D.  Cooper  (1981) 

Major  Michael  S.  Gaydeski  (1982) 


U.  Minnesota  Dr.  William  B.  Thompson 
Mr.  Joseph  K.  Kearney 


(1981-1982) 

(1981-1982) 


Figure  4.  DIDA  Roster 
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Capt.  F.  Cooper's  Master's  thesis  entitled  "Disparity  Analysis  of  Time-Varying 
Imagery"  (AFIT/GE/EE/810-1)  attempted  to  develop  a  software  simulation  package  to 
evaluate  point  selection  and  point  matching  algorithms.  In  the  six  month  period 
alloted  to  AFIT  students,  Capt.  Cooper  managed  to  produce  experimental  photographs  of 
his  computer  simulation  experiments  and  obtain  preliminary  results.  While  these 
preliminary  experiments  provide  instructive  insights  into  the  subtleties  of  point 
matching  algorithms,  more  debugging  of  the  software  is  required  before  its  general 
acceptance.  This  work  is  documented  in  a  370  page  thesis,  including  software 
listings.  In  the  course  of  this  work,  AAAT  acquired  for  Capt.  Cooper's  use  image 
data  bases  from  three  prominent  research  centers  which  will  also  prove  useful  in 
future  research. 

Major  M.  Gaydeski's  Master's  thesis,  entitled  "Disparity  Analysis  -  Real-Time 
Determination  of  Object  Motion"  (AFIT/FE/EE/82-1)  is  a  survey  of  image 
transformations  and  features  which  could  prove  useful  in  mapping  grey  level  images 
into  binary  images  via  a  thresholding  function.  This  survey  includes  sections  on 
global  threshold  selection  for  segmentation,  local  segmentation,  and  global  local 
edge  coincidence.  This  literature  survey  was  done  in  support  of  the  adaptive  image 
matching  system  discussed  in  Section  V. 

The  research  accomplished  at  the  University  of  Minnesota  by  Thompson  and 
Kearney  is  documented  in  AFWAL-TR-83-1035  entitled  "DIDA  -  Dynamic  Image  Disparity 
Analysis."  The  objective  of  this  study  was  to  perform  the  initial  analysis  required 
to  implement  a  system  for  determining  disparity  values  in  real-time.  The  final 
report  describes  the  results  of  a  study  of  the  feasibility  of  developing  a  device  for 
the  real-time  estimation  of  motion  induced  disparity  in  image  sequences.  The  report 
describes  the  nature  of  the  estimation  problem  and  suggests  criteria  by  which 
methods  for  estimating  disparity  can  be  evaluated.  It  includes  a  theoretical 
analysis  of  one  class  of  estimation  methods  and  shows  how  such  an  analysis  can  lead 
to  improved  performance.  The  results  obtained  from  a  variety  of  estimation 
algorithms  are  demonstrated  on  a  limited  sample  of  dynamic  imagery.  Finally, 
preliminary  hardware  and  software  requirements  are  provided  for  real-time  disparity 
analysis  devices.  Work  under  this  contract  has  successfully  produced  analytical 
analyses  of  the  intrinsic  limitations  of  several  state-of-the-art  algorithms.  This 
analysis  is  being  used  to  modify  these  algorithms  to  improve  performance.  Several 
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hybrid  approaches  which  can  combine  the  strengths  of  several  different  existing 
approaches  have  been  identified. 
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SECTION  IV 

GRADIENT  STATISTICS  ? 

1.  OISCRETE  GRADIENT  VECTORS 

The  original  motivation  for  studying  gradient  statistics  was  to  develop 
rotational ly  invariant  and  more  efficient  point  selector  operators.  The  gradient 
statistics  formalism  which  evolved  from  this  study  facilitates  not  only  a  unified  and 
comprehensive  treatment  of  interest  measures,  but  also  the  development  of  other 
fundamental  feature  detectors  for  image  understanding  systems.  Initially,  the  study 
developed  simplified  algorithms  for  the  Moravec  and  Hannah  interest  operators; 
however,  it  also  used  the  formalism  to  develop  new  interest  measures  which  are 
rotat  onally  invariant.  The  presentation  of  results  which  follows  is  organized  in  a 
manner  which  reflects  the  chronological  order  of  this  research. 

By  way  of  review,  a  gradient  is  a  vector  denoting  the  magnitude  and  direction 
of  the  maximum  rate  of  change  of  a  scalar.  The  gradient  vector  components  of  a  two- 
dimensional  function  I(x,y)  are  simply  the  partial  derivatives: 

Grad.  I  =  (5l/;X,  6l/*y)  (24) 

or,  rewritten  using  an  abbreviated  notation, 

71  =  (I  ,  i  ) 

X  y  '  • 

If  a  coordinate  rotation  is  given  by 

X1  =  X  cose  +  y  sine  (25a ) 

y’  =  -x  sine  +  y  cose 


then .the  gradient  vector  transformation  is 


I  . 

a  I  cose 

+  I  sine 

x 

X 

y 

I  , 

=  -I  sine 

+  I  cose 

y 

X 

y 

(25b) 
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If,  for  example,  ?  *  45°  then 


r 


X'  =  Ct  =  (x  +  y)/  /2;  Ia  -  (Ix  +  I  )  /  /2 

y‘  -  8  =  (-x  +  y)/  /2;  I&  =  (-1^  +  I  )/  /2  (26a) 

(a,  8  used  below  to  signify  45°  rotations). 


For  this  special  case,  the  new  coordinates  and  components  are  simply  the  sum  and 
difference  divided  by  IT. 


This  brief  review  of  gradients  uses  the  conventional  formalism  for  dealing  with 
analytical  functions.  How  to  apply  this  gradient  formalism  to  digitized  images, 
which  are  discrete  arrays  of  intensity  values,  is  not  obvious.  Consider  the  2X2 
array  of  pixels  depicted  in  Figure  5.  Discrete  estimates  of  the  parital  derivatives 
of  I(i,j)  are  given  by  pixel  differences: 


Ix  =  1(b)  -  1(a)  or  =  1(d)  -  1(b) 


Iy  =  Kc)  -  1(a)  or  =  1(d)  -  1(b) 


(27) 


system  yields 
I  =  (1(d)  -  1(a)  )  /  /2 

Ct 

t  =  (1(c)  -  1(b)  ) 


(26b) 


where  the  factor  Jl  is  the  diagonal  distance  between  the  diagonal  pixels  (assuming 
unit  distance  between  vertical  and  horizontal  neighbors).  If  one  uses  Equation  25b 
to  rotate  the  components  45°,  one  obtains 


I  -  (I  -K)  /  /2  =  [1(b)  +  1(d)  -  1(a)  -  I (c)]/2 
x  a  8 


(28) 


Iy  *  (Io+I0)/^  *  [1(c)  +  1(d)  -  1(a)  -  I(b)]/2 
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Notice  that  this  discrete  approximation  for  the  two  partial  derivatives  differs  from 
Equation  27.  These  partial  derivatives  depend  on  four  pixels  instead  of  two."  The 
Ix  partial  derivative  is  the  average  of  two  estimates  of  Ix  type,  i.e.,  D  -  C  and  B  - 
A;  Iy  is  the  average  of  C  -  A  and  D  -  B.  Although  it  may  appear  that  matters  are 
slightly  more  complicated  by  using  Equation  28  in  place  of  Equation  27,  it  will  be 
shown  that  subsequent  formalisms  are  not  only  rotational ly  consistent,  but  easier  to 
compute.  Definitions  26b  and  28  are  consistent  with  the  requirement  that  vector 
magnitudes  be  rotationally  invariant,  i.e.. 


2  2  2  2  2 

p  •  r  +  r  ■  r  +  i: 

x  y  a  S 


(29) 


The  Sobel  edge  detector,  which  is  one  of  the  most  popular  edge  detectors,  is 
defined  by  two  fundamental  measures: 


Sx  =  [(c  +  2f  +  i)  -  (a  +  2d  +  g) 2/8 
Sy  a  [(9  +  2h  +  i)  -  (a  +2b  +  c)]/S 
where  a,b,...  are  used  in  place  of  1(a),  1(b) . 


g 

h 

i 

d 

e 

f 

a 

b 

c 

One  can  easily  see  that  Sx  equals  the  average  of  the  four  horizontal  gradient 
components  (Equation  28)  derivable  from  the  four  2X2  areas,  which  fit  inside  the  3 
X  3  area.  Similarly,  Sy  is  also  the  average  of  four  estimates  of  the  vertical 
derivatives.  Hence,  one  can  think  of  the  Sobel  operator  as  an  average  over  the  more 
fundamental  2X2  operators.  This  local  averaging  tends  to  eliminate  noise  and 
smooth  out  edges. 


2.  IMPROVED  INTEREST  MEASURE  ALGORITHMS 

If  the  DIRVAR  and  EVAR  interest  measures  defined  in  Section  II  are  reviewed  in 
light -of  the  rotationally  consistent  gradient  definitions  defined  in  this  section, 
several  improvements  are  readily  evident.  The  horizontal  and  vertical  difference 
statistics  (recall  Equation  15)  can  be  interpreted  as 

DO)  -  E[I(i+l,j)  -  I(i,j)]2  -  l  H 

(30; 

0(2)  -  z[I(i ,j+l )  -  I(i,j)]2  »  z  i* 
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where  I  ,  I  are  the  partial  derivative  estimates  are  defined  by  Equation  27.  In  a 

*  J 

similar  manner,  the  diagonal  difference  statistics  can  be  written  as^ 

DO)  -  Z[I(i+l.j+l)  -  I(i,j)]2  =  Z(/2  IJ2  -  (30b) 

0(4)  =  Z[I(i+l,j)  -  l(i,j+l)]2  =  l[A 2  T  )2 
*  0 

where  I  and  I„  are  defined  in  Equation  26b.  If  the  interest  operators  are  to  be 

&  P 

rotationally  invariant,  the  coordinate  system  used  should  make  no  difference;  i.e., 
the  use  of  (I  ,  I  )  and  (I  ,  I  )  should  be  interchangeable.  This  would  mean  the 

ci  p  a  y 

diagonal  differences  D(3)  and  0(4)  should  be  scaled  by  the  factor  1/2  relative  to 
D ( 1 )  and  D ( 2 )  in  order  to  account  for  the  distance  between  diagonal  pixels.  In 
other  words,  by  properly  scaling  differences,  one  compares  the  slope  of  the  intensity 
in  four  principal  directions.  This  is  more  meaningful  and  consistent  than  comparing 
just  the  pixel  intensity  differences. 

Interest  measures  based  on  the  directional  statistics  D(i)  would  be  enhanced 
and  simplified  if  the  following  definitions  are  adopted: 

DO)  *  Z  1 2  0(2)  =  r  I2  (31) 

A  y 

0(3)  *  Z  I2,  0(4)  =  Z  I2 

where  (I  ,  Ifi)  are  defined  by  Equation  26b  and  (Iv,  I  )  by  Equation  28.  It  now 

u  D  a  jr 

follows  from  invariance  of  vector  magnitudes  that  the  newly  defined  D(i)  satisfy 

0(1)  +  0(2)  -  0(3)  +  0(4)  (32) 

hence  only  three  of  the  0's  are  independent.  Equation  32  can  be  used  to  solve  for 
the  fourth  0  which  means  computations  are  effectively  reduced  by  25%. 

The  three  Independent  statistics  which  characterize  the  four  D's  and  link  them 
with  the  formalism  to  be  developed  and  used  in  the  remainder  of  this  paper  are: 

A  «  Z  I2  +  Z  I?  -  D(l)  +  0(2)  -  0(3)  +  D(4) 

a  s 

2  2  (33) 

I  ■  I  I  ■  I  i:  ■  0(3)  -  0(4) 

a  d 
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C  =  I  I2  -  2  I2  =  2  l  I2 
x  y  x 


whet's- 


r 


2  0(1 )  =  [A  +  C,  A  -  C,  A  +  B,  A  -  B]  (34) 

Substitution  of  the  0{i)  from  (Equation  31)  into  the  original  definitions  for 
directed  variance  and  edge  variance  (see  Equations  16  and  17)  yields 

DIRVAR  «  MIN  [0 ( i ) ]  *  (A  -  MX >/2  (35) 

/» 

EVAR  =  VAR  *  MIN  [ D(i )/D(i )]  =  VAR  *  (A-MX)/(A+MX )  (36) 

where  MX  =  MAX  [|B|,  |C|].  (37) 

The  selection  of  the  minimum  value  is  now  accomplished  with  one  compare  operation 
where  originally  it  took  three.  Also  note  that  this  definition  of  EVAR  requires  only 
one  division. 


Both  the  (x,y)  and  (a, 8)  components  of  the  gradient  vector  are  derivable  from 
the  same  2x2  array  of  pixels.  Since  the  2x2  arrays  fit  evenly  into  rectangular 
windows,  an  equal  number  of  components  in  the  four  principal  directions  are  obtained 
within  such  windows.  In  the  original  definitions  of  the  D(i),  this  was  not  the  case 
(see  Equations  15).  The  original  D(i)  were  defined  as  averages  to  compensate  for 
the  variation  in  the  number  of  diagonal  and  horizontal  or  vertical  pixel  differences. 
Hence  the  process  of  sorting  and  averaging  would  require  division  operations.  The 
new  directional  statistics  (Equation  31) always  involve  the  same  number  of  terms,  hence 
they  may  be  sorted  and  compared  without  division  by  the  number  of  terms. 

The  improvements  in  the  computation  of  interest  measures  demonstrate  the 
advantage  to  be  derived  from  consistent  use  of  gradient  statistics.  A  more 
generalized  treatment  of  applications  for  gradient  statistics  follows,  which  will 
Include  new  Interest  operators  which  are  rotational ly  invariant  and  analytic. 
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3.  INTERPRETATION  OF  GRADIENT  HISTOGRAMS 


Given  a  localized  set  of  gradient  vectors,  how  can  one  extract  various 
interesting  image  features  from  this  data?  Originally,  we  used  gradient  histograms  as 
the  basis  of  statistical  interpretation.  Computation  of  these  gradient  histograms 
requires  the  direction  and  magnitude  of  each  vector: 


■  J  ‘a  *  'b  '  7lx  +  >y 


e  =  tan'1  I/I  =  (tan-1  I  /I  )  -ir/4 
y  x  p  ct 


(38) 


The  angular  dimension  is  quantized  into  M  bins,  each  spanning  2^/M  radians.  The 

magnitudes  of  the  vectors  falling  into  each  bin  are  summed 

h !  *  I  oi  (39) 

J  ic  Bin(j) 


and  a  normalized  set  of  weights  obtained  by 


hJ/ 


M 

(  S 
K=1 


h,) 


(40) 


Figure  6  depicts  some  idealized  histograms  and  corresponding  examples  of 
localized  window  images.  Histograms  may  be  characterized  by  various  parameters 
which  ideally  correspond  to  meaningful  features  in  the  localized  images.  One 
example  of  a  useful  parameter  is  entropy. 


M 

E  =  -  z  PK  ln(pK) 


(41) 


which  equals  zero  for  a  single  edge  and  equals  the  maximum  value  (In  M)  for  uniform 
histograms.  The  entropy  measure  does  not  distinguish  between  histograms  b  and  c 
which  have  equal  entropy. 


_ -In  interest  measures,  the  orthogonal  aspects  of  b,  d,  and  e  are  significant. 

The  heed  to  distinguish  histograms  with  large  orthogonal  aspects  leads  to  the 
following  concept  of  directional  weighted  averages  for  each  bin: 

M 

Wj  *  I  PK  |sin(eK  -  0j)|  (42) 
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e.  Uniform 
Distribution, 
Maximum  Entropy 


Figure  6.  Gradient  Histograms 
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The  term  J sin  aq|  is  a  relative  weighting  factor  which  is  maximum  for  ek  orthogonal 
to  and  which  diminishes  to  a  zero  value  as  the  0^  align  with  the  direction  of  6j. 

To  convert  these  directional  bin  averages  into  a  single  parameter,  the  following 
average  of  <dj  was  defined: 

T  -  z  PKwK  =  r  z  PK  Pj  |sln(0j  -  eK)|  (43) 

K  I 

which  is  a  measure  of  the  orthogonal  aspect  of  the  gradients  in  a  sampled  window. 

It  is  easy  to  see  that  T  tends  to  zero  for  cases  like  histograms  a  and  c.  Windows 
with  larger  T  values  have  more  orthogonal  or  omnidirectional  attributes  which  are 
desirable  for  selecting  localizable  and  distinguishable  pixels.  This  line  of 
investigation  was  abandoned  because  the  expressions  are  computationally  cumbersome; 
however,  the  concepts  discussed  above  were  modified  and  incorporated  into  the 
analytical  formalism  presented  next. 

4.  ABC'S  OF  DIRECTIONAL  WEIGHTING  FUNCTIONS 

Several  directional  weighting  functions,  such  as  Equations  42  and  43,  evolved 
from  early  investigation  of  histogram  statistics.  Formally,  the  definitions  of  the 
orthogonal  and  parallel  weighting  functions  are: 

N  22 

wl  (0)  =  I  p.  sin  (9.  -e)  (44a) 

i  =1  1  1 

N  2 

wl 1(e)*  z  pfcos  ( 9 •  -  e)  (45a) 

i=l  1  1 

where  N  *  Number  of  gradient  samples. 

ci»  *  Magnitude  and  direction  of  it*1  vector  (see  Equation  38). 

Note -that  these  definitions  do  not  depend  on  the  establishment  of  angular  bins  such 
as  were  used  in  histogram  formalism.  It  is  an  easy  exercise  to  derive  the  following 
expressions  for  the  weighting  functions: 

2wKe)  ■  A  -  B  sin  2e  -  C  cos  29  (44b) 

2w I | ( 9 )  ■  A  +  B  sin  29  +  C  cos  29  (45b) 
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where 


6  ■ 1 A sin  28i  ■ r  cf  -  's’ 

C  »  e  p2  cos  29.  -  e  Cl l~  l2y) 


(same  as  Equation  33) 


Utilizing  sin2  Ae  as  a  relative  weighting  factor  in  w!  instead  of  |sin  A9|  as  used 
in  Equation  42  provides  an  analytic  function  of  0  with  the  same  A,B,C  statistics  that 
were  utilized  to  simplify  the  derivation  of  interest  measures  DIRVAR  and  EVAR 
(Equation  33-36) . 

The  weighting  function  wl(8)  is  a  measure  of  the  extent  to  which  the  sample 
gradients  are  orthogonal  to  a  given  direction  (wi(0)*wl(e+ir)).  The  tendency  of 
vectors  to  align  themselves  in  a  given  direction  9  is  given  by  the  weighting  function 
w | |(e).  These  dual  weighting  functions  are  both  obtained  for  the  price  of  computing 
the  A,B,C  statistics.  The  sum  of  wj_and  w, ;  equals  A,  which  is  independent  of  the 
relative  direction  of  the  gradient  vectors.  For  later  reference,  we  rewrite 
Equations  44  and  45  in  a  form  that  readily  depicts  the  max  and  min  values  for  these 
dual  functions: 


2  w±(e)  *  A  -  R  cos  2 (6-r )  (44c) 

2w!  i  ( e)  =  A  +  R  cos  2  (9-t)  (45c) 


where 


2t  *  tan  B/C 


R2  »  B2  +  C2 


(46a) 


(46b) 
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5.  ORIENTATION  ASPECTS 

We  now  define  a  new  parameter  called  the  orthogonal  aspect: 


p?  wl(8i ) 


N  N  2  2  2 

Z  E  p.  p„  sin  (9. 

•  i/l  ^  « 


Other  expressions  are: 

2  2  2  2  2 
2T  *  A  -  B  -  c  =  A  -  R 


(47a) 


(47b) 


where 


T  =  21  I  (vl.  x  VI,,)  •  (VI  x  vl„) 
i  <  K  1  K  1  K 

■» 

VI.  ■  [p.  cos  p.  sin  6^. 


(47c) 


We  see  that  T  which  is  a  simple  function  of  the  gradient  statistics  A,  B,  C,  is  the 
continuous  analogue  to  histogram  average  T  (compare  Equations  43  and  47a).  Equation 
47c  expresses  T  as  the  sum  of  the  square  magnitudes  of  the  cross  products  between  all 
gradient  vector  pairs.  Recall  that  the  vector  cross  product  between  two  vectors  is 
another  vector, 

V.  x  V*K  =  W  (48) 

such  that 

1 w |  *  !v. |  |VKI  I  sin  (ei  -  eK)| . 


Recall  that  cross  products  between  parallel  or  antiparallel  vectors  vanish.  Hence 
terms  with  i  *  j  vanish  in  Equations  47a  and  c. 

5t-  ■ 
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There  are  two  possible  definitions  for  the  parallel  aspect,  both  of  which  are 
dual  to  the  definitions  of  the  orthogonal  aspect  given  in  Equation  47.  The  first  __ 
version,  L,  is  a  function  of  A,  B,  and  C:  Z 


N  2  N  N  «  « 

L  =  E  p.  w.,  (9.)  =  I  I  of  COS^  (0. 
i  1  h  1  i  K  1  1 


V 


(49a) 


2  L  =  A2  +  B2  +  C2  =  A2  +  R2 


L  *  T  -  A2  -  C  £  o?]2 


L  -  T  -  R?  «  I  I  (,2  p2  cos  2  (9,  -  eK! 


(49b) 

(50) 

(51) 


The  second  version.  A,  is  a  counterpart  to  Equation  47c: 


A  =  2  E  E  (vl.  •  VIJ2 
i  <  K  1  K 


A  -  L  -  D 


(52a) 

(52b) 


where 


D  *  E  pj  =  E  (VI.  •  VI.)2 


(53a) 


is  a  new  gradient  statistic.  The  ambiguity  in  the  parallel  aspect  definition  arises 
from  the  fact  that  unlike  a  cross  product,  the  dot  product  of  a  vector  with  itself 
is  not  identically  zero.  Equation  51  includes  the  self  products,  whereas  Equation  52 
excludes  them.  An  alternative  expression  for  the  sum  of  self  products. 


E  p 


1 


*  <*a  + 


(53b) 


7.  7 

illustrates  that  the  computation  of  0  involves  terms  I  and  which -are  used  to 

Cl  M 

compute  A  and  B  so  that  the  extra  effort  to  compute  D  is  minimal. 
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Other  measures  are  derivable  from  the  statistics  defined  above: 

S  =  A  +  T  =  A2  -  D  =  2  T  Z  P-  P* 

i  <  K 

"  2  7  7 

~n=A-T  =  R-  D  =  2  Z  I  pf  pi  COS  2  (0-  -  6„) 

.  V  i  <  K  1  K  1  K 


where 

-1  <  Q/S  <  1 . 


(54) 

(55) 


The  following  three  ratios  may  be  interpreted  as  averages  of  various  trigonometric 
functions  of  Ae..^  =  0..  -  with  the  product  of  magnitudes  (p^  pk)  utilized  as  the 
weighting  parameter: 


Q/S  *  <cos  2a8>  =  cos  2a0 

A/S  *  <cos2  A6>  =  cos2  A9  (56) 

2  2  — 

T/S  =  <sin  A0>  *  sin  A0 


where 


0  <_  A0  <  n. 

Figure  7  depicts  various  sample  sets  of  unit  vectors  and  corresponding  statistical 
measures,  which  include  limits  for  Q/S  3  _+  1 .  Statistics  derivable  from  A,  B,  C,  and 
D,  which  are  sums  of  terms  with  even  powers,  do  not  distinguish  between  parallel  and 
anti  parallel  vectors  as  depicted  by  seemingly  different  sets  with  identical  even  order 
statistics. 


The  mean  gradient  vector  (G,  $) 

N 

G  cos  $  *  <1  >  *  Z  p4  cos  #4/N 
a  .  1  1 


G  sin  $  •  <Ig>  *  lD|  sin  $j/N 


(57) 

(58) 
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Figure  7.  Gradient  Statistics:  Examples  use  unit  vectors  in  eight  principle  directions 
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is  useful  for  characterizing  the  parallel /anti parallel  aspects  of  the  gradient  sets. 
The  direction,  $,  of  the  mean  gradient  provides  an  invariant  reference  direction. 

The  projection  of  the  mean  gradient  vector  onto  any  direction,  e,  is  .given  by 

M(e)  =  x  p.  cos  (9.  -«■  -0  )/N  =  G  cos  -  e) 

2  N  2 

G^  =  x  PK  M(eK)/N  =  x  r  p1  PK  cos  (0i  -  0k)/n 
K  K  l 

Equations  59  and  60  demonstrate  more  explicitly  the  dependence  of  G  on  the  extent 
to  which  the  vector  pairs  are  parallel  or  antiparallel.  Note  that  while  the  mean 
gradient  vector  is  invariant,  the  vector  components  depend  on  the  coordinate  system 

used  to  define  them.  If  <1  >  and  <1  >  were  used  in  Definition  57,  G  would  be  the 

x  y 

same  and  the  reference  angle,  $,  would  be  changed  by  w/4  in  the  rotated  coordinates. 


(59) 

(60) 


6.  NEW  INTEREST  MEASURES  -  -  -  -  - 

In  this  segment,  we  shall  explore  interest  measures  utilizing  the  gradient 

formalism  already  introduced.  The  modified  directional  statistics  D(i)  utilized  in 
the  definitions  of  DIRVAR  and  EVAR  (see  Equations  34-36)  can  be  expressed  in  terms 

of  (see  Equation  44b):-  -  -  - -  - -  - 

_  _____  i 


2D(1 )  *  wK3*/4)  -  A  +  B  .. 
-2D (2)  =  wli3w/4)  =  A  B 
20(3)  »  wl(ir/2)  *  A  il 
20(4)  _=  wi(0)  3  A  _-_C  _ 


so  that 


■^2 JIlRVAR  ■  HJ N  Iwi(0T,-wlU/4),~wi(ir/2),  wl(37r/4J?3 - 


(34a) 

(35a) 

(36a) 
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Note  that  these  measures  depend  on  wl{e)  evaluated  in  four  principal  directions, 
which  in  turn  depend  on  the  particular  coordinate  system  -in  use. —Thg,  continuous  ant 
rotationally  invariant  analogues  to  these  equations  are: 

~  2  DIRVAR'  =  MIN  [wife)]  =  wl(min)  -  A  -  B  .  (61) 

2  EVAR*  =  VAR  *  wi(min)/wj_(max)  =  VAR  (A  -  R)/(A  +  R)  (62) 

where  the  angle  between  wi(max)  and  wjl(min)  is  tt/2.  Here  we  have  replaced  MX  in 
Equation  38  and  39  with  R  =  /82  +  C2. 


Reexpressing  the  orthogonal  aspect  in  terms  of  w_L(max)  and  wj_(min)  yields 


- T  =  A2  -  R2  =  wl(max)  *.wlj[min)  .  _ _ _  (63) 

which  suggests  that  T  be  considered  as  an  interest  measure  for  point  selection.  The 
interest  measures  defined  in  the  last  three  equations  are  all  analytic,  rotationally 
invariant,  and  proportional  to  wj_(min).  The  relative  merits  of  the  various  new 
measures  introduced  in  this  paper  await  future  computer  simulation  experiments  using 
different  types  of  imagery.  .... 
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SECTION  SUMMARY- _ 


In  classical  pattern  recognition,  one  extracts  a  set  of  features  (.feature  vectors) 
which  is  used  to  distinguish  between  predetermined  classes  of  subimages  or  textured 
patterns.  Effective  features  provide  good  separation  of  the  predetermined  classes  in 
the  multidimensional  feature  space.  -In  image  analysis  (or  image  understanding),  one 
extracts  locally  derived  features  (image  primitives)  and  attempts  to  integrate  the 
simpler  primitives  into  more  complex  and  globally  extended  structures*,  which  are_ 
meaningful  to  the  applications  at  hand.  The  basic  primitives  are  edges,  texture, 
contrast  (all  of  which  are  to  some  extent  characterized '  by  gradients T^and  color.  ~~ 


-  -Sets  «f  image  primitives -or  statistical  features  may  be  relatively  easy  to  compute 
as  locally  derived  functions  of  the  A,  B,  C,  and  0  statistics,' along  with  first  order 
statistics  (G,$). —  Dual  -direc*4ena4  -statistics-have  -been  discussed  which  .measure _ 

;  f 

.relative  orientations  between  pair's  of  gradient  vectors.  A ‘complete  evaluation  of' 
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these  aspects  and  other  rotational ly  invariant  functions  will  require  further 
experimentation  andd  computer  analysis  of  digitized  images.  The  following  brief 
summary  of  application  areas  for  gradient  statistics  includes  new  areas  for  further 
research: 

1.  Point  selection  (interest)  operators.  These  were  fully  discussed, 
including  old  and  new  definitions. 

2.  Local  edge  detectors.  By  definition  these  detectors  are  functions  of 
local  gradients.  See  the  discussion  on  the  Sobel  edge  detector  in  this  section. 

3.  Image  segmentation.  To  formulate  more  extensive  and  unified  image 
features,  images  are  segmented  into  regions  with  uniform  or  similar  features.  Color 
is  a  good  feature  to  use.  The  use  of  uniform  gradient  statistics  may  also  apply  to 
the  segmentation  problem. 

4.  Feature  detection.  A  preliminary  survey  of  the  values  of  T  and  L  for 
small  binary  images  inicates  that  T  is  sensitive  to  corners  and  crosses,  while  L  is 
sensitive  to  straight  lines  and  fibrils  (antiparallel  sets  of  lines).  This  is 
encouraging;  however,  to  fully  explore  the  discriminating  power  of  the  directional 
aspects,  interactive  experiments  with  grey  level  images  are  required.  These 
operators  can  be  used  to  cue  areas  of  potential  interest. 

5.  Textural  analysis.  Although  not  discussed  in  this  paper,  this 
subject  has  been  an  active  research  topic  for  the  last  25  years  or  so;  hence  there 
is  an  extensive  literature  available  on  it.  The  techniques  for  analyzing  and 
synthesizing  texture  in  images  are  generally  divided  into  structural  and  statistical 
methods.  The  first  approach  looks  for  repetitive  patterns  of  image  primitives  such 
as  lines,  simple  shapes, etc.  The  statistical  approach  utilizes  statistical 
parameters  which  characterize  the  distribution  patterns  and  spatial  relationships 

between  pixel  gray  levels.  Some  techniques  use  extensive  histograms  called  Spatial- 

- 

Grey-Level  Dependence  Matrices  (Reference  14,  p.  186)  which  quantify  .differences  not 
just  of  adjacent  pixels,  but  of  pixels  with  different  relative  separations.  Other 
techniques  analyze  polar  histograms  obtained  from  partitioning  the  Fourier  power 
spectrum.  These  techniques  require  extensive  computations.  The  relatively  simple 
gradient  features  outlined  above  may  provide  simpler  alternative  approaches  for 
working  with  texture.  ~ 
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SECTION  V 

ADAPTIVE  IMAGE  MATCHING  ALGORITHM 

The  problem  of  computing  complete  image-to-image  mapping  in  real  time  is 
unsolved.  In  fact,  computing  a  complete  disparity  field  is  not  well  understood. 

Point  matching  algorithms  are  robust  and  computationally  extensive,  but  cannot  solve 
the  complete  mapping  problem  by  themselves.  One  attempt  to  achieve  complete  mapping 
uses  spatial-temporal  gradient  analysis  with  a  smoothness  constraint  (see  Section 
II).  The  smoothness  constraint  results  in  iterative  equations  to  "fill  in"  uniform 
intensity  regions.  The  problem  with  this  approach  is  that  the  fill-in  process  needs 
to  start  from  high  contrast  boundary  regions  in  which  the  smoothness  constraint  is 
weakest.  The  experimental  approach  as  presented  here  also  addresses  the  complete 
mapping  problem. 

An  adaptive  binary  image  matching  algorithm  may  be  characterized  as  a  blob 
matcher.  In  this  approach,  a  uniform  region  is  not  viewed  as  something  to  be  filled 
in  or  as  a  region  devoid  of  matchable  points,  but  as  a  unique  feature  that  can  be 
isolated  in  separate  images  and  compared.  In  viewing  aerial  photographs,  a  lake  will 
stand  out  as  a  distinctive  uniform  area,  or  blob,  which  is  a  key  feature  in  one's 
initial  registration  of  multiple  images.  Unfortunately,  there  are  not  enough  of 
these  naturally  occurring  blobs  to  provide  a  complete  disparity  map.  The  proposed 
approach  investigates  means  for  generating  artificial  blobs.  In  this  way,  the  blob 
matching  process  can  proceed  until  a  completely  integrated  picture  of  the  disparity 
field  is  accumulated. 

The  problem  of  generating  artificial  blobs  is  equivalent  to  transforming  grey- 
level  images  into  binary  images.  In  binary  images,  the  concept  of  a  blob  is  obviousj 
i.e.,  a  blob  is  an  island  of  black  pixels  in  a  sea  of  white  pixels.  There  are  an 
’"■finity  of  such  image  transformations.  Generally,  binary  images  are  achieved  with 
the  following  two  steps:  transforming  the  grey-level  image  into  a  different  image 
representation,  and  thresholding  the  new  image  representation.  All  conceivable 
transformations  are  not  equally  suitable.  Sought  are  binary  mapping  transformations 
which  have  high  expectations  that  corresponding  physical  elements  will  map  into 
corresponding  blobs.  For  example,  selecting  the  brightest  regions  (or  darkest)  in  an 
image  will  generally  be  expected  to  correspond  to  the  same  surface  elements  in 
another  image. 
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An  adaptive  image  matching  system,  together  with  experimental  support 
facilities,  is  depicted  in  Figure  8.  Inputs  to*the  system  are  two  images,  I  and  I', 
and  the  output  is  a  complete  disparity  vector  field,  D,  which  maps  the  corresponding 
points  from  one  image  to  the  other.  The  input  images  are  subjected  to  a  sequence  of 
image  transformations,  T(I),  each  of  which  may  be  transformed  into  one  or  more 
binary  images  by  thresholding.  Each  pair  of  binary  images  B(I)  and  B(I')  is 
compared.  The  relative  displacement  of  elements  in  the  images  provides  information 
about  the  disparity  field,  AD,  over  the  area  of  the  blobs,  thus  providing  a  piece  of 
the  disparity  puzzle.  Some  of  the  pieces,  however,  may  overlap  and  could  have 
somewhat  different  estimates  for  the  disparity  field  in  these  overlapping  regions. 

All  partial  results  are  accumulated  and  integrated  in  the  final  block  in  Figure  8 
which  also  provides  executive  control  over  the  sequencing  of  transforms, 
thresholding,  and  interfacing  with  the  interactive  work  station  containing  an 
extensive  software  library. 

The  functional  blocks  in  Figure  8  should  be  viewed  as  software  modules  for  near 
term  simulation  and  hardware  modules  for  a  future  prototype  real-time  image 
processor.  A  realistic  research  program  should  constantly  attempt  to  match  simulated 
algorithms  with  identifiable  hardware  technology.  The  scope  of  this  experimental 
approach  is  very  broad,  so  that  a  hardware  implementation  of  successful  simulated 
algorithms  could  evolve  into  an  experimental  test  bed  for  promising  image  processing 
technology. 

A  workstation,  built  around  a  computer  such  as  the  VAX-11/780,  provides  a 
general  purpose  research  and  development  tool  for  various  laboratory  projects,  as 
well  as  the  basis  for  the  matching  system.  Special  care  in  designing  efficient  data 
structures  and  management  procedures  is  needed  because  of  the  large  flow  of  data 
involved  in  accumulating  the  complete  disparity  field.  The  library  should  be  stocked 
with  published  algorithms  and  available  software  packages.  This  includes  a  package 
to  simulate  cellular  logic  or  systolic  arrays  to  explore  morphological  image  analysis 
for  comparing  binary  images. 

The  initial  set  of  transforms  used  in  the  matching  experiments  will  consist  of 
commonly  used  algorithms  utilized  In  pattern  recognition  and  image  understanding. 
These  tried  and  tested  algorithms  can  be  expected  to  highlight  physically  meaningful 
attributes,  such  as  texture  measures  and  gradient  statistics.  Color,  of  course. 
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Figure  8.  Adaptive  Image  Matching  System 
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would  be  an  excellent  feature  to  use  if  available.  By  keeping  histories  of 
performance,  each  transform  can  be  evaluated  and  indexed.  Given  the  ability  to 
measure  performance,  automatic  means  for  learning  new  and  useful  transforms  can  be 
developed  for  enhancing  the  system's  repertoire.  More  sophisticated  learning 
algorithms  can  learn  to  select  a  sequence  of  transforms  to  match  a  priori  or 
contextual  knowledge  about  the  images  being  matched.  Most  of  the  image  transforms 
will  be  local  neighborhood  or  window  functions  to  facilitate  real-time  implementation 
schemes.  A  special  module  is  envisioned  for  extracting  global  features  and 
histograms  from  the  images,  in  order  to  supply  the  contextual  information  for  the 
adaptive  transform  generation. 

The  feed  back  path  from  the  comparator  to  the  threshold  module  would  be  used  to 
vary  the  threshold  settings  in  order  to  achieve  convenient  blob  density  and  sizes. 

The  masking  ability  would  be  used  to  perform  blob  matching  or  to  concentrate 
attention  in  those  regions  which  are  sparsely  represented  in  the  disparity  field 
accumulator.  It  may  be  feasible  to  generate  a  confidence  measure  for  each  entry  in 
the  accumulator.  Similar  estimates  derived  from  different  transforms  should  increase 
these  confidence  measures.  Different  estimates  will  require  arbitration. 

The  use  of  binary  image  matching  has  been  presented  as  a  promising  alterative 
for  accumulating  complete  disparity  maps.  Image  matching  systems  of  the  future  will 
be  hybrid  systems  incorporating  a  variety  of  matching  algorithms  such  as  reviewed  in 
Section  II.  The  velocity  constraint  equation  (vi.v  +  <sl/ <St  *  0),  for  example,  could 
help  delineate  corresponding  blobs  and  provide  independent  measurements  for  cross¬ 
checking  the  accumulating  matching  results.  The  technical  disciplines  needed  to 
implement  real-time  systems  include  image  understanding,  artificial  intelligence,  and 
dynamic  image  analysis,  plus  super  computers  for  Image  processing.  All  of  these 
disciplines  are  heavily  supported  by  DOD.  One  of  the  original  objectives  for 
creating  the  OIDA  program  was  to  help  establish  a  center  for  machine  perception. 

This  center  would  in  turn  provide  an  alternative  and  effective  means  for  harvesting 
the^sults  of  DOD's  research  Investment.  _ 
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